Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells443442
Missing cells (%)8.3%8.3%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh Correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh Correlation
Age has 92 (20.6%) missing values Age has 91 (20.4%) missing values Missing
Cabin has 350 (78.5%) missing values Cabin has 351 (78.7%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 319 (71.5%) zeros SibSp has 302 (67.7%) zeros Zeros
Parch has 361 (80.9%) zeros Parch has 337 (75.6%) zeros Zeros
Fare has 11 (2.5%) zeros Fare has 7 (1.6%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2024-07-15 18:27:12.1871242024-07-15 18:27:15.548406
Analysis finished2024-07-15 18:27:15.5471852024-07-15 18:27:18.622790
Duration3.36 seconds3.07 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean455.32287430.2287
 Dataset ADataset B
Minimum11
Maximum891891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T18:27:18.763264image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum11
5-th percentile44.2543.25
Q1219.25204.25
median465.5410
Q3686.5654.75
95-th percentile841.5850.25
Maximum891891
Range890890
Interquartile range (IQR)467.25450.5

Descriptive statistics

 Dataset ADataset B
Standard deviation259.72165261.00336
Coefficient of variation (CV)0.570412040.60666189
Kurtosis-1.2467364-1.2234281
Mean455.32287430.2287
Median Absolute Deviation (MAD)236225
Skewness-0.055900830.089698726
Sum203074191882
Variance67455.33468122.752
MonotonicityNot monotonicNot monotonic
2024-07-15T18:27:18.965393image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
373 1
 
0.2%
14 1
 
0.2%
299 1
 
0.2%
685 1
 
0.2%
532 1
 
0.2%
842 1
 
0.2%
97 1
 
0.2%
321 1
 
0.2%
570 1
 
0.2%
2 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
13 1
 
0.2%
762 1
 
0.2%
565 1
 
0.2%
756 1
 
0.2%
142 1
 
0.2%
475 1
 
0.2%
813 1
 
0.2%
684 1
 
0.2%
570 1
 
0.2%
2 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
4 1
0.2%
5 1
0.2%
7 1
0.2%
10 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
20 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
4 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
11 1
0.2%
13 1
0.2%
14 1
0.2%
23 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
4 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
11 1
0.2%
13 1
0.2%
14 1
0.2%
23 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
4 1
0.2%
5 1
0.2%
7 1
0.2%
10 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
20 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
280 
1
166 
0
271 
1
175 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row00
2nd row01
3rd row11
4th row10
5th row00

Common Values

ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%

Length

2024-07-15T18:27:19.225215image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-07-15T18:27:19.333660image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:19.434267image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%

Most occurring characters

ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
253 
1
109 
2
84 
3
257 
1
99 
2
90 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row33
2nd row12
3rd row12
4th row13
5th row13

Common Values

ValueCountFrequency (%)
3 253
56.7%
1 109
24.4%
2 84
 
18.8%
ValueCountFrequency (%)
3 257
57.6%
1 99
 
22.2%
2 90
 
20.2%

Length

2024-07-15T18:27:19.545822image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-07-15T18:27:19.655475image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:19.772033image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
ValueCountFrequency (%)
3 253
56.7%
1 109
24.4%
2 84
 
18.8%
ValueCountFrequency (%)
3 257
57.6%
1 99
 
22.2%
2 90
 
20.2%

Most occurring characters

ValueCountFrequency (%)
3 253
56.7%
1 109
24.4%
2 84
 
18.8%
ValueCountFrequency (%)
3 257
57.6%
1 99
 
22.2%
2 90
 
20.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 253
56.7%
1 109
24.4%
2 84
 
18.8%
ValueCountFrequency (%)
3 257
57.6%
1 99
 
22.2%
2 90
 
20.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 253
56.7%
1 109
24.4%
2 84
 
18.8%
ValueCountFrequency (%)
3 257
57.6%
1 99
 
22.2%
2 90
 
20.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 253
56.7%
1 109
24.4%
2 84
 
18.8%
ValueCountFrequency (%)
3 257
57.6%
1 99
 
22.2%
2 90
 
20.2%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T18:27:20.190847image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Length

 Dataset ADataset B
Max length6765
Median length4547
Mean length26.5941727.338565
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1186112193
Distinct characters6059
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowBeavan, Mr. William ThomasSaundercock, Mr. William Henry
2nd rowMeyer, Mr. Edgar JosephKantor, Mrs. Sinai (Miriam Sternin)
3rd rowHawksford, Mr. Walter JamesMellinger, Mrs. (Elizabeth Anne Maidment)
4th rowSloper, Mr. William ThompsonKallio, Mr. Nikolai Erland
5th rowButt, Major. Archibald WillinghamKelly, Mr. James
ValueCountFrequency (%)
mr 269
 
15.0%
miss 97
 
5.4%
mrs 57
 
3.2%
william 27
 
1.5%
john 19
 
1.1%
henry 15
 
0.8%
master 15
 
0.8%
george 13
 
0.7%
james 12
 
0.7%
thomas 11
 
0.6%
Other values (909) 1257
70.1%
ValueCountFrequency (%)
mr 260
 
14.3%
miss 98
 
5.4%
mrs 61
 
3.3%
william 32
 
1.8%
john 26
 
1.4%
master 20
 
1.1%
henry 19
 
1.0%
charles 12
 
0.7%
george 10
 
0.5%
joseph 10
 
0.5%
Other values (903) 1273
69.9%
2024-07-15T18:27:20.841045image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1346
 
11.3%
r 955
 
8.1%
e 838
 
7.1%
a 817
 
6.9%
i 669
 
5.6%
s 659
 
5.6%
n 651
 
5.5%
M 568
 
4.8%
l 521
 
4.4%
o 490
 
4.1%
Other values (50) 4347
36.6%
ValueCountFrequency (%)
1375
 
11.3%
r 998
 
8.2%
e 860
 
7.1%
a 850
 
7.0%
i 683
 
5.6%
n 665
 
5.5%
s 656
 
5.4%
M 571
 
4.7%
l 539
 
4.4%
o 517
 
4.2%
Other values (49) 4479
36.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11861
100.0%
ValueCountFrequency (%)
(unknown) 12193
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1346
 
11.3%
r 955
 
8.1%
e 838
 
7.1%
a 817
 
6.9%
i 669
 
5.6%
s 659
 
5.6%
n 651
 
5.5%
M 568
 
4.8%
l 521
 
4.4%
o 490
 
4.1%
Other values (50) 4347
36.6%
ValueCountFrequency (%)
1375
 
11.3%
r 998
 
8.2%
e 860
 
7.1%
a 850
 
7.0%
i 683
 
5.6%
n 665
 
5.5%
s 656
 
5.4%
M 571
 
4.7%
l 539
 
4.4%
o 517
 
4.2%
Other values (49) 4479
36.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11861
100.0%
ValueCountFrequency (%)
(unknown) 12193
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1346
 
11.3%
r 955
 
8.1%
e 838
 
7.1%
a 817
 
6.9%
i 669
 
5.6%
s 659
 
5.6%
n 651
 
5.5%
M 568
 
4.8%
l 521
 
4.4%
o 490
 
4.1%
Other values (50) 4347
36.6%
ValueCountFrequency (%)
1375
 
11.3%
r 998
 
8.2%
e 860
 
7.1%
a 850
 
7.0%
i 683
 
5.6%
n 665
 
5.5%
s 656
 
5.4%
M 571
 
4.7%
l 539
 
4.4%
o 517
 
4.2%
Other values (49) 4479
36.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11861
100.0%
ValueCountFrequency (%)
(unknown) 12193
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1346
 
11.3%
r 955
 
8.1%
e 838
 
7.1%
a 817
 
6.9%
i 669
 
5.6%
s 659
 
5.6%
n 651
 
5.5%
M 568
 
4.8%
l 521
 
4.4%
o 490
 
4.1%
Other values (50) 4347
36.6%
ValueCountFrequency (%)
1375
 
11.3%
r 998
 
8.2%
e 860
 
7.1%
a 850
 
7.0%
i 683
 
5.6%
n 665
 
5.5%
s 656
 
5.4%
M 571
 
4.7%
l 539
 
4.4%
o 517
 
4.2%
Other values (49) 4479
36.7%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
291 
female
155 
male
284 
female
162 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.69506734.7264574
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20942108
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowmalefemale
3rd rowmalefemale
4th rowmalemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 291
65.2%
female 155
34.8%
ValueCountFrequency (%)
male 284
63.7%
female 162
36.3%

Length

2024-07-15T18:27:21.005153image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-07-15T18:27:21.127950image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:21.228773image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
ValueCountFrequency (%)
male 291
65.2%
female 155
34.8%
ValueCountFrequency (%)
male 284
63.7%
female 162
36.3%

Most occurring characters

ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2094
100.0%
ValueCountFrequency (%)
(unknown) 2108
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2094
100.0%
ValueCountFrequency (%)
(unknown) 2108
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2094
100.0%
ValueCountFrequency (%)
(unknown) 2108
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7774
Distinct (%)21.8%20.8%
Missing9291
Missing (%)20.6%20.4%
Infinite00
Infinite (%)0.0%0.0%
Mean29.9543528.65707
 Dataset ADataset B
Minimum0.420.42
Maximum7174
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T18:27:21.382731image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.42
5-th percentile44
Q12119
median2928
Q33936
95-th percentile55.67552.6
Maximum7174
Range70.5873.58
Interquartile range (IQR)1817

Descriptive statistics

 Dataset ADataset B
Standard deviation14.49527614.027763
Coefficient of variation (CV)0.483912210.48950442
Kurtosis0.00666274190.48710028
Mean29.9543528.65707
Median Absolute Deviation (MAD)98
Skewness0.326616720.44577308
Sum10603.8410173.26
Variance210.11303196.77812
MonotonicityNot monotonicNot monotonic
2024-07-15T18:27:21.589569image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24 16
 
3.6%
19 15
 
3.4%
30 15
 
3.4%
28 13
 
2.9%
22 13
 
2.9%
21 12
 
2.7%
18 12
 
2.7%
25 12
 
2.7%
35 11
 
2.5%
32 11
 
2.5%
Other values (67) 224
50.2%
(Missing) 92
20.6%
ValueCountFrequency (%)
24 20
 
4.5%
18 17
 
3.8%
25 16
 
3.6%
28 14
 
3.1%
30 13
 
2.9%
36 12
 
2.7%
21 12
 
2.7%
16 11
 
2.5%
19 11
 
2.5%
35 10
 
2.2%
Other values (64) 219
49.1%
(Missing) 91
20.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 2
0.4%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 4
0.9%
3 3
0.7%
4 3
0.7%
6 3
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 1
 
0.2%
0.92 1
 
0.2%
1 4
0.9%
2 5
1.1%
3 3
0.7%
4 4
0.9%
5 3
0.7%
7 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 1
 
0.2%
0.92 1
 
0.2%
1 4
0.9%
2 5
1.1%
3 3
0.7%
4 4
0.9%
5 3
0.7%
7 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 2
0.4%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 4
0.9%
3 3
0.7%
4 3
0.7%
6 3
0.7%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.432735430.52690583
 Dataset ADataset B
Minimum00
Maximum88
Zeros319302
Zeros (%)71.5%67.7%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T18:27:21.745252image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile22.75
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation0.90873671.0904871
Coefficient of variation (CV)2.09998222.0696053
Kurtosis16.24267416.902633
Mean0.432735430.52690583
Median Absolute Deviation (MAD)00
Skewness3.4044273.5811554
Sum193235
Variance0.825802391.1891621
MonotonicityNot monotonicNot monotonic
2024-07-15T18:27:21.868756image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 319
71.5%
1 96
 
21.5%
2 12
 
2.7%
3 9
 
2.0%
4 7
 
1.6%
5 2
 
0.4%
8 1
 
0.2%
ValueCountFrequency (%)
0 302
67.7%
1 107
 
24.0%
2 14
 
3.1%
4 10
 
2.2%
3 7
 
1.6%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 319
71.5%
1 96
 
21.5%
2 12
 
2.7%
3 9
 
2.0%
4 7
 
1.6%
5 2
 
0.4%
8 1
 
0.2%
ValueCountFrequency (%)
0 302
67.7%
1 107
 
24.0%
2 14
 
3.1%
3 7
 
1.6%
4 10
 
2.2%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 302
67.7%
1 107
 
24.0%
2 14
 
3.1%
3 7
 
1.6%
4 10
 
2.2%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 319
71.5%
1 96
 
21.5%
2 12
 
2.7%
3 9
 
2.0%
4 7
 
1.6%
5 2
 
0.4%
8 1
 
0.2%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.318385650.38565022
 Dataset ADataset B
Minimum00
Maximum66
Zeros361337
Zeros (%)80.9%75.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T18:27:21.985130image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum66
Range66
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.78007380.81512136
Coefficient of variation (CV)2.4500912.1136286
Kurtosis13.94903511.765569
Mean0.318385650.38565022
Median Absolute Deviation (MAD)00
Skewness3.28577092.9356946
Sum142172
Variance0.608515140.66442283
MonotonicityNot monotonicNot monotonic
2024-07-15T18:27:22.104100image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 361
80.9%
1 44
 
9.9%
2 34
 
7.6%
5 2
 
0.4%
4 2
 
0.4%
3 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 62
 
13.9%
2 41
 
9.2%
5 3
 
0.7%
3 1
 
0.2%
4 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 361
80.9%
1 44
 
9.9%
2 34
 
7.6%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 62
 
13.9%
2 41
 
9.2%
3 1
 
0.2%
4 1
 
0.2%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 62
 
13.9%
2 41
 
9.2%
3 1
 
0.2%
4 1
 
0.2%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 361
80.9%
1 44
 
9.9%
2 34
 
7.6%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%
6 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct391384
Distinct (%)87.7%86.1%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T18:27:22.654962image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.80044846.8520179
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters30333056
Distinct characters3531
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique348333 ?
Unique (%)78.0%74.7%

Sample

 Dataset ADataset B
1st row323951A/5. 2151
2nd rowPC 17604244367
3rd row16988250644
4th row113788STON/O 2. 3101274
5th row113050363592
ValueCountFrequency (%)
pc 30
 
5.3%
a/5 12
 
2.1%
c.a 11
 
1.9%
ston/o 7
 
1.2%
2 7
 
1.2%
soton/o.q 6
 
1.1%
soton/oq 5
 
0.9%
1601 5
 
0.9%
sc/paris 5
 
0.9%
347088 5
 
0.9%
Other values (409) 472
83.5%
ValueCountFrequency (%)
pc 31
 
5.4%
c.a 15
 
2.6%
ca 8
 
1.4%
ston/o 8
 
1.4%
2 8
 
1.4%
sc/paris 7
 
1.2%
a/5 5
 
0.9%
3101295 5
 
0.9%
2144 4
 
0.7%
a/4 4
 
0.7%
Other values (406) 480
83.5%
2024-07-15T18:27:23.506164image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 363
12.0%
1 346
11.4%
2 285
9.4%
7 259
8.5%
4 236
 
7.8%
0 209
 
6.9%
6 200
 
6.6%
5 199
 
6.6%
9 166
 
5.5%
8 149
 
4.9%
Other values (25) 621
20.5%
ValueCountFrequency (%)
3 379
12.4%
1 342
11.2%
2 309
10.1%
7 237
 
7.8%
4 236
 
7.7%
6 203
 
6.6%
0 200
 
6.5%
5 186
 
6.1%
9 169
 
5.5%
8 150
 
4.9%
Other values (21) 645
21.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3033
100.0%
ValueCountFrequency (%)
(unknown) 3056
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 363
12.0%
1 346
11.4%
2 285
9.4%
7 259
8.5%
4 236
 
7.8%
0 209
 
6.9%
6 200
 
6.6%
5 199
 
6.6%
9 166
 
5.5%
8 149
 
4.9%
Other values (25) 621
20.5%
ValueCountFrequency (%)
3 379
12.4%
1 342
11.2%
2 309
10.1%
7 237
 
7.8%
4 236
 
7.7%
6 203
 
6.6%
0 200
 
6.5%
5 186
 
6.1%
9 169
 
5.5%
8 150
 
4.9%
Other values (21) 645
21.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3033
100.0%
ValueCountFrequency (%)
(unknown) 3056
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 363
12.0%
1 346
11.4%
2 285
9.4%
7 259
8.5%
4 236
 
7.8%
0 209
 
6.9%
6 200
 
6.6%
5 199
 
6.6%
9 166
 
5.5%
8 149
 
4.9%
Other values (25) 621
20.5%
ValueCountFrequency (%)
3 379
12.4%
1 342
11.2%
2 309
10.1%
7 237
 
7.8%
4 236
 
7.7%
6 203
 
6.6%
0 200
 
6.5%
5 186
 
6.1%
9 169
 
5.5%
8 150
 
4.9%
Other values (21) 645
21.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3033
100.0%
ValueCountFrequency (%)
(unknown) 3056
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 363
12.0%
1 346
11.4%
2 285
9.4%
7 259
8.5%
4 236
 
7.8%
0 209
 
6.9%
6 200
 
6.6%
5 199
 
6.6%
9 166
 
5.5%
8 149
 
4.9%
Other values (25) 621
20.5%
ValueCountFrequency (%)
3 379
12.4%
1 342
11.2%
2 309
10.1%
7 237
 
7.8%
4 236
 
7.7%
6 203
 
6.6%
0 200
 
6.5%
5 186
 
6.1%
9 169
 
5.5%
8 150
 
4.9%
Other values (21) 645
21.1%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct182175
Distinct (%)40.8%39.2%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean30.47738229.24032
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros117
Zeros (%)2.5%1.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T18:27:23.700547image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.1257.225
Q17.89587.8958
median1313
Q33030.5
95-th percentile103.1937586.5
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)22.104222.6042

Descriptive statistics

 Dataset ADataset B
Standard deviation50.77430344.604556
Coefficient of variation (CV)1.66596671.5254469
Kurtosis39.89893238.326349
Mean30.47738229.24032
Median Absolute Deviation (MAD)5.755.4104
Skewness5.36993765.0534316
Sum13592.91213041.183
Variance2578.02981989.5664
MonotonicityNot monotonicNot monotonic
2024-07-15T18:27:23.908833image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 26
 
5.8%
13 21
 
4.7%
7.8958 17
 
3.8%
10.5 13
 
2.9%
7.75 13
 
2.9%
7.8542 12
 
2.7%
0 11
 
2.5%
7.925 11
 
2.5%
26.55 9
 
2.0%
26 9
 
2.0%
Other values (172) 304
68.2%
ValueCountFrequency (%)
7.8958 24
 
5.4%
7.75 23
 
5.2%
8.05 22
 
4.9%
13 20
 
4.5%
10.5 15
 
3.4%
26 13
 
2.9%
7.925 10
 
2.2%
7.8542 9
 
2.0%
7.225 8
 
1.8%
0 7
 
1.6%
Other values (165) 295
66.1%
ValueCountFrequency (%)
0 11
2.5%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
7.0458 1
 
0.2%
7.05 5
1.1%
7.125 3
 
0.7%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 7
1.6%
5 1
 
0.2%
6.4375 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
ValueCountFrequency (%)
0 7
1.6%
5 1
 
0.2%
6.4375 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
ValueCountFrequency (%)
0 11
2.5%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
7.0458 1
 
0.2%
7.05 5
1.1%
7.125 3
 
0.7%
7.1417 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8577
Distinct (%)88.5%81.1%
Missing350351
Missing (%)78.5%78.7%
Memory size7.0 KiB7.0 KiB
2024-07-15T18:27:24.399198image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1115
Median length33
Mean length3.72916673.7894737
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters358360
Distinct characters1819
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7562 ?
Unique (%)78.1%65.3%

Sample

 Dataset ADataset B
1st rowD45A24
2nd rowA6B94
3rd rowB38A34
4th rowG6C128
5th rowE8A20
ValueCountFrequency (%)
c23 3
 
2.6%
c27 3
 
2.6%
f 3
 
2.6%
c25 3
 
2.6%
b35 2
 
1.8%
e101 2
 
1.8%
g6 2
 
1.8%
d35 2
 
1.8%
e44 2
 
1.8%
c92 2
 
1.8%
Other values (86) 90
78.9%
ValueCountFrequency (%)
g6 4
 
3.4%
c26 3
 
2.6%
c22 3
 
2.6%
b55 2
 
1.7%
f 2
 
1.7%
c123 2
 
1.7%
f2 2
 
1.7%
c68 2
 
1.7%
e44 2
 
1.7%
b35 2
 
1.7%
Other values (78) 92
79.3%
2024-07-15T18:27:25.023302image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 41
11.5%
C 40
11.2%
1 33
 
9.2%
3 26
 
7.3%
6 24
 
6.7%
5 23
 
6.4%
B 23
 
6.4%
4 21
 
5.9%
D 18
 
5.0%
18
 
5.0%
Other values (8) 91
25.4%
ValueCountFrequency (%)
2 41
11.4%
C 35
9.7%
B 34
9.4%
1 28
 
7.8%
5 28
 
7.8%
3 27
 
7.5%
6 27
 
7.5%
21
 
5.8%
4 19
 
5.3%
E 16
 
4.4%
Other values (9) 84
23.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 358
100.0%
ValueCountFrequency (%)
(unknown) 360
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 41
11.5%
C 40
11.2%
1 33
 
9.2%
3 26
 
7.3%
6 24
 
6.7%
5 23
 
6.4%
B 23
 
6.4%
4 21
 
5.9%
D 18
 
5.0%
18
 
5.0%
Other values (8) 91
25.4%
ValueCountFrequency (%)
2 41
11.4%
C 35
9.7%
B 34
9.4%
1 28
 
7.8%
5 28
 
7.8%
3 27
 
7.5%
6 27
 
7.5%
21
 
5.8%
4 19
 
5.3%
E 16
 
4.4%
Other values (9) 84
23.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 358
100.0%
ValueCountFrequency (%)
(unknown) 360
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 41
11.5%
C 40
11.2%
1 33
 
9.2%
3 26
 
7.3%
6 24
 
6.7%
5 23
 
6.4%
B 23
 
6.4%
4 21
 
5.9%
D 18
 
5.0%
18
 
5.0%
Other values (8) 91
25.4%
ValueCountFrequency (%)
2 41
11.4%
C 35
9.7%
B 34
9.4%
1 28
 
7.8%
5 28
 
7.8%
3 27
 
7.5%
6 27
 
7.5%
21
 
5.8%
4 19
 
5.3%
E 16
 
4.4%
Other values (9) 84
23.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 358
100.0%
ValueCountFrequency (%)
(unknown) 360
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 41
11.5%
C 40
11.2%
1 33
 
9.2%
3 26
 
7.3%
6 24
 
6.7%
5 23
 
6.4%
B 23
 
6.4%
4 21
 
5.9%
D 18
 
5.0%
18
 
5.0%
Other values (8) 91
25.4%
ValueCountFrequency (%)
2 41
11.4%
C 35
9.7%
B 34
9.4%
1 28
 
7.8%
5 28
 
7.8%
3 27
 
7.5%
6 27
 
7.5%
21
 
5.8%
4 19
 
5.3%
E 16
 
4.4%
Other values (9) 84
23.3%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing10
Missing (%)0.2%0.0%
Memory size7.0 KiB7.0 KiB
S
332 
C
81 
Q
 
32
S
315 
C
90 
Q
41 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowCS
3rd rowSS
4th rowSS
5th rowSS

Common Values

ValueCountFrequency (%)
S 332
74.4%
C 81
 
18.2%
Q 32
 
7.2%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 315
70.6%
C 90
 
20.2%
Q 41
 
9.2%

Length

2024-07-15T18:27:25.172908image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-07-15T18:27:25.281353image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:25.390857image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
ValueCountFrequency (%)
s 332
74.6%
c 81
 
18.2%
q 32
 
7.2%
ValueCountFrequency (%)
s 315
70.6%
c 90
 
20.2%
q 41
 
9.2%

Most occurring characters

ValueCountFrequency (%)
S 332
74.6%
C 81
 
18.2%
Q 32
 
7.2%
ValueCountFrequency (%)
S 315
70.6%
C 90
 
20.2%
Q 41
 
9.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 332
74.6%
C 81
 
18.2%
Q 32
 
7.2%
ValueCountFrequency (%)
S 315
70.6%
C 90
 
20.2%
Q 41
 
9.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 332
74.6%
C 81
 
18.2%
Q 32
 
7.2%
ValueCountFrequency (%)
S 315
70.6%
C 90
 
20.2%
Q 41
 
9.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 332
74.6%
C 81
 
18.2%
Q 32
 
7.2%
ValueCountFrequency (%)
S 315
70.6%
C 90
 
20.2%
Q 41
 
9.2%

Interactions

Dataset A

2024-07-15T18:27:14.658399image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:17.757038image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:12.458161image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:15.762580image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:12.969641image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:16.220208image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:13.440222image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:16.793721image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:14.017570image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:17.282384image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:14.744177image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:17.842805image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:12.579069image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:15.846159image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:13.057638image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:16.311147image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:13.531873image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:16.886832image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:14.137398image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:17.369145image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:14.838107image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:17.939924image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:12.688856image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:15.940483image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:13.157228image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:16.409240image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:13.714455image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:16.983116image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:14.269103image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:17.467899image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:14.939623image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:18.040427image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:12.790022image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:16.039606image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:13.249539image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:16.501797image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:13.819262image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:17.088878image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:14.409197image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:17.570336image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:15.032306image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:18.134870image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:12.882216image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:16.131705image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:13.345906image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:16.598692image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:13.918494image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:17.186739image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T18:27:14.538363image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:17.664583image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Correlations

Dataset A

2024-07-15T18:27:25.477768image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T18:27:25.615433image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0000.133-0.2480.0170.2380.160-0.1220.153
Embarked0.0001.0000.1750.0390.0320.2210.1590.0570.204
Fare0.1330.1751.0000.3590.0080.4550.1650.4270.316
Parch-0.2480.0390.3591.0000.0060.0000.2410.4400.117
PassengerId0.0170.0320.0080.0061.0000.0070.000-0.0330.165
Pclass0.2380.2210.4550.0000.0071.0000.1250.1150.347
Sex0.1600.1590.1650.2410.0000.1251.0000.2610.523
SibSp-0.1220.0570.4270.440-0.0330.1150.2611.0000.183
Survived0.1530.2040.3160.1170.1650.3470.5230.1831.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0850.107-0.2530.0540.2620.092-0.1770.137
Embarked0.0851.0000.1690.0600.0730.2440.1590.0880.195
Fare0.1070.1691.0000.442-0.0150.4880.2190.5000.297
Parch-0.2530.0600.4421.000-0.0010.0000.2510.4690.131
PassengerId0.0540.073-0.015-0.0011.0000.1110.088-0.0810.000
Pclass0.2620.2440.4880.0000.1111.0000.1150.1200.278
Sex0.0920.1590.2190.2510.0880.1151.0000.1800.542
SibSp-0.1770.0880.5000.469-0.0810.1200.1801.0000.129
Survived0.1370.1950.2970.1310.0000.2780.5420.1291.000

Missing values

Dataset A

2024-07-15T18:27:15.164629image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-07-15T18:27:18.266879image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-07-15T18:27:15.356351image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-07-15T18:27:18.455950image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2024-07-15T18:27:15.486551image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2024-07-15T18:27:18.571022image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
37237303Beavan, Mr. William Thomasmale19.0003239518.0500NaNS
343501Meyer, Mr. Edgar Josephmale28.010PC 1760482.1708NaNC
74074111Hawksford, Mr. Walter JamesmaleNaN001698830.0000D45S
232411Sloper, Mr. William Thompsonmale28.00011378835.5000A6S
53653701Butt, Major. Archibald Willinghammale45.00011305026.5500B38S
25125203Strom, Mrs. Wilhelm (Elna Matilda Persson)female29.01134705410.4625G6S
80981011Chambers, Mrs. Norman Campbell (Bertha Griggs)female33.01011380653.1000E8S
56456503Meanwell, Miss. (Marion Ogden)femaleNaN00SOTON/O.Q. 3920878.0500NaNS
59659712Leitch, Miss. Jessie WillsfemaleNaN0024872733.0000NaNS
17317403Sivola, Mr. Antti Wilhelmmale21.000STON/O 2. 31012807.9250NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
121303Saundercock, Mr. William Henrymale20.000A/5. 21518.0500NaNS
31631712Kantor, Mrs. Sinai (Miriam Sternin)female24.01024436726.0000NaNS
27227312Mellinger, Mrs. (Elizabeth Anne Maidment)female41.00125064419.5000NaNS
43343403Kallio, Mr. Nikolai Erlandmale17.000STON/O 2. 31012747.1250NaNS
69669703Kelly, Mr. Jamesmale44.0003635928.0500NaNS
65665703Radeff, Mr. AlexandermaleNaN003492237.8958NaNS
41641712Drew, Mrs. James Vivian (Lulu Thorne Christian)female34.0112822032.5000NaNS
12012102Hickman, Mr. Stanley Georgemale21.020S.O.C. 1487973.5000NaNS
62963003O'Connell, Mr. Patrick DmaleNaN003349127.7333NaNQ
60360403Torber, Mr. Ernst Williammale44.0003645118.0500NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
67767813Turja, Miss. Anna Sofiafemale18.00041389.8417NaNS
60160203Slabenoff, Mr. PetcomaleNaN003492147.8958NaNS
62162211Kimball, Mr. Edwin Nelson Jrmale42.0101175352.5542D19S
81881903Holm, Mr. John Fredrik Alexandermale43.000C 70756.4500NaNS
222313McGowan, Miss. Anna "Annie"female15.0003309238.0292NaNQ
84484503Culumovic, Mr. Jesomale17.0003150908.6625NaNS
71171201Klaber, Mr. HermanmaleNaN0011302826.5500C124S
46546603Goncalves, Mr. Manuel Estanslasmale38.000SOTON/O.Q. 31013067.0500NaNS
60560603Lindell, Mr. Edvard Bengtssonmale36.01034991015.5500NaNS
86486502Gill, Mr. John Williammale24.00023386613.0000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
30931011Francatelli, Miss. Laura Mabelfemale30.000PC 1748556.9292E36C
56656703Stoytcheff, Mr. Iliamale19.0003492057.8958NaNS
555611Woolner, Mr. HughmaleNaN001994735.5000C52S
83083113Yasbeck, Mrs. Antoni (Selini Alexander)female15.010265914.4542NaNC
32732812Ball, Mrs. (Ada E Hall)female36.0002855113.0000DS
575803Novel, Mr. Mansouermale28.50026977.2292NaNC
66266301Colley, Mr. Edward Pomeroymale47.000572725.5875E58S
79079103Keane, Mr. Andrew "Andy"maleNaN00124607.7500NaNQ
55355413Leeni, Mr. Fahim ("Philip Zenni")male22.00026207.2250NaNC
59859903Boulos, Mr. HannamaleNaN0026647.2250NaNC

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.